Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: add top-level melt function as method #15521

Closed
wants to merge 7 commits into from

Conversation

ResidentMario
Copy link
Contributor

@ResidentMario ResidentMario commented Feb 27, 2017

This is continued from PR#15513.

@ResidentMario
Copy link
Contributor Author

How should this be tested?

@jreback
Copy link
Contributor

jreback commented Feb 27, 2017

where pd.melt is currently tests, you can add in a a test like

test_api_method or something and just copy an existing tests and subtitute DataFrame.melt for pd.melt (or changing a tests is ok too)

@jreback jreback added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 27, 2017
@ResidentMario
Copy link
Contributor Author

Copying an entire test case would cause a lot of duplication though. Since DataFrame.melt is supposed to have the same method signature as pd.melt, can't we just replace melt with DataFrame.melt in the test bodies?

@jreback
Copy link
Contributor

jreback commented Feb 27, 2017

@ResidentMario yes you can simply replace them (though I would still add a single function to test the api (e.g. that pd.melt works)

@jreback
Copy link
Contributor

jreback commented Feb 27, 2017

also add this to api.rst. and in the reshaing.rst I would change the docs.

@ResidentMario
Copy link
Contributor Author

And and addition to whatsnew, presumably. Where would that go under there?

@jreback
Copy link
Contributor

jreback commented Feb 27, 2017

you could put it in Enhancements

@codecov-io
Copy link

codecov-io commented Feb 27, 2017

Codecov Report

Merging #15521 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15521      +/-   ##
==========================================
+ Coverage   90.97%   90.97%   +<.01%     
==========================================
  Files         145      145              
  Lines       49483    49492       +9     
==========================================
+ Hits        45015    45024       +9     
  Misses       4468     4468
Flag Coverage Δ
#multiple 88.73% <100%> (ø) ⬆️
#single 40.63% <77.77%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 97.57% <100%> (ø) ⬆️
pandas/core/reshape.py 99.28% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da0523a...1657246. Read the comment docs.

@ResidentMario
Copy link
Contributor Author

gtm?

@@ -160,6 +160,7 @@ Other enhancements
- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)

- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`).
- ``DataFrame`` has gained a ``melt()`` method for unpivoting from a wide to long format (:issue:`12640`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that this is the same as pd.melt

@jreback
Copy link
Contributor

jreback commented Mar 2, 2017

lgtm. @jorisvandenbossche

@jorisvandenbossche
Copy link
Member

Will take a look tomorrow!

@ResidentMario
Copy link
Contributor Author

@jorisvandenbossche bump :)

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

can you rebase, this looks good though (mainly to fix the whatsnew)

@jreback jreback added this to the 0.20.0 milestone Apr 3, 2017
@ResidentMario
Copy link
Contributor Author

Rebased.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few doc comments

@@ -105,7 +105,9 @@
optional_by="""
by : str or list of str
Name or list of names which refer to the axis items.""",
versionadded_to_excel='')
versionadded_to_excel='',
versionadded_melt='\n.. versionadded:: 0.20.0\n',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs extra spaces (you can check pd.DataFrame.melt? if you are on this branch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by this, can you clarify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do pd.DataFrame.melt? you see:

Signature: pd.DataFrame.melt(self, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
Docstring:
    "Unpivots" a DataFrame from wide format to long format, optionally leaving
    identifier variables set.

    This function is useful to massage a DataFrame into a format where one
    or more columns are identifier variables (`id_vars`), while all other
    columns, considered measured variables (`value_vars`), are "unpivoted" to
    the row axis, leaving just two non-identifier columns, 'variable' and
    'value'.

.. versionadded:: 0.20.0


    Parameters
    ----------
    ....

so the versionadded is not indented as the other lines.

But I think Jeff fixed it before merging

@@ -265,7 +265,7 @@ the right thing:
Reshaping by Melt
-----------------

The :func:`~pandas.melt` function is useful to massage a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention here both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (GitHub is not folding this review component for some reason, however).

columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blank line can be removed

@jorisvandenbossche
Copy link
Member

@ResidentMario Sorry my 'tomorrow' took a lot longer .. !

To be clear: I am in the meantime fully convinced of the usefulness of melt, and having it as method on a DataFrame as counterpart of the pivot method is really nice.

My main concern I have (or maybe not really concern, bur rather opportunity) is about the exact API. There have been raised some issues about this (#10109, #14876 (comment)), so I only think we should take this as an opportunity to look at the current melt API and see if we would implement it in the same way now.
And conclusion can well be that we just want to keep it as is.

For example, IMO it would be useful to make a comparison between the R melt and gather, see how Hadley has improved it in the new gather version and see if we can learn something of it.

If we don't find the time for this before 0.20 release, I am OK with merging this.

@jreback
Copy link
Contributor

jreback commented Apr 4, 2017

ok @ResidentMario pls make those corrections and we can merge. If after this (in next week or two) we can figure out a better API we can discuss, otherwise can do it in next major version.

Note that also like to incorporate (and deprecate) wide_to_long here.

btw should also look at cast and reshape R functions for inspiration.

@jorisvandenbossche
Copy link
Member

-1 on incoroporating wide_to_long. As far as I know, it is hardly used, and has an API that is not really 'pandonic'.

@ResidentMario
Copy link
Contributor Author

ResidentMario commented Apr 4, 2017

stack, unstack, melt, etcetera...agree that studying the possible advantages of data transforms in other languages would be a good thing.

I don't see this being done in the next two weeks however.

@jreback
Copy link
Contributor

jreback commented Apr 4, 2017

@ResidentMario

stack, unstack, melt, etcetera...agree that studying the possible advantages of data transforms in other languages would be a good thing.

I don't see this being done in the next two weeks however.

not a problem

@jreback
Copy link
Contributor

jreback commented Apr 4, 2017

thanks @ResidentMario

I made the doc-string a tiny bit more flexible (so the examples are more clear), but great patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants